About 5139 letters
About 26 minutes
A Regular Expression (Regex) is a powerful tool for matching and processing text. It defines a search pattern using a specific syntax within a string.
For example, verifying whether an input email address is valid character by character is tedious. Instead, a regular expression like:
^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$
can be used to validate it.
import re
# Validate email format
email_pattern = r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$'
if re.match(email_pattern, "user@example.com"):
print("Valid email")
Metacharacter | Meaning | Example |
---|---|---|
. | Matches any single character (except newline) | a.c → abc , a1c |
^ | Matches the beginning of the string | ^abc → matches abcxxxx |
$ | Matches the end of the string | abc$ → matches xxxxabc |
* | Matches 0 or more repetitions of the preceding character | a* → "" , a , aa |
+ | Matches 1 or more repetitions | a+ → a , aa |
? | Matches 0 or 1 repetition | a? → "" , a |
{n} | Matches exactly n repetitions | a{2} → aa |
{min,} | Matches at least min repetitions | a{2,} → aa , aaa , aaaa |
{min,max} | Matches between min and max repetitions | a{2,3} → aa , aaa |
[] | Matches any one character inside the brackets | [abc] → a , b , c |
[^] | Matches any one character not in the brackets | [^abc] → d , e , f |
[-] | Indicates a range | [a-z] → a , b , ..., z |
() | Groups expressions | (abc)+ → abc , abcabc |
| | OR operator | abc|xyz → abc or xyz |
\d | Matches any digit, same as [0-9] | \d → 1 , 2 , 3 |
\D | Matches any non-digit, same as [^0-9] | \D → a , @ , _ |
\w | Matches alphanumeric or underscore, [a-zA-Z0-9_] | \w → a , 1 , _ |
\W | Matches non-word characters, [^a-zA-Z0-9_] | \W → @ , # |
\s | Matches any whitespace character | \s → space, \t , \n , etc. |
\S | Matches any non-whitespace character | \S → a , 1 , @ |
\b | Matches word boundaries | \bcat\b → matches cat in a sentence |
\B | Matches non-word boundaries | \Bcat\B → matches cat in scatter |
\r | Carriage return | |
\n | Newline | |
\f | Form feed | |
\t | Tab | |
\v | Vertical tab | |
\ | Escape character to treat special characters literally | \+ → + |
By default, regex uses greedy matching, which means it tries to match the longest possible string. If a ?
is added, it switches to lazy (non-greedy) matching, which matches the shortest possible string.
Greedy Pattern | Description | Lazy Pattern | Description |
---|---|---|---|
.* | Match 0 or more, longest possible | .*? | Match 0 or more, shortest |
.+ | Match 1 or more, longest possible | .+? | Match 1 or more, shortest |
.? | Match 0 or 1, longest | .?? | Match 0 or 1, shortest |
.{n,m} | Match n to m times, longest | .{n,m}? | Match n to m times, shortest |
.{n,} | Match at least n, longest | .{n,}? | Match at least n, shortest |
Created in 5/15/2025
Updated in 5/15/2025